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INTRODUCTION TO COMPUTER IMAGE PROCESSING 

Johannes G. Moik 
Computer Sciences Corporation 


ABSTRACT 


This report presents theoretical backgrounds and digital techniques for 
a class of image processing problems. It is intended to provide knowl- 
edge for the use of the VICAR and SMIP systems in operation at the 
Laboratory for Meteorology and Earth Sciences, 

Generally two kinds of image processing can be distinguished. One kind 
transforms images into new images being different from the original in 
some desirable way. Image restoration and image enhancement belong 
to this group and the objective is to compensate for distortions intro- 
duced by the image forming process. Since image formation can be de- 
scribed or approximated by linear processes, two- dimensional linear 
system theory and linear transformations provide the theoretical frame- 
work for image restoration and image enhancement. 


The result of the other kind of image processing is not an image but may 
take the form of a description or parametrization. Curve detection, ob- 
ject extraction, area determination and classification are problems in 
this class of image processing where linear transformations, statistics, 
graph theory and heuristics provide useful mathematical methods. 


This report is mainly devoted to the first kind of image processing. 

Image formation in the context of linear system theory, image evalua- 
tion, noise characteristics, mathematical operations on images and their 
implementation are discussed in chapter 2. Various techniques for image 
restoration and image enhancement are presented in chapters 3 and 4 
respectively. Chapter 5 describes methods for object extraction and the 
problem of pictorial pattern recognition and classification is briefly 
discussed in chapter 6. 
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INTRODUCTION TO COMPUTER IMAGE PROCESSING 


1. INTRODUCTION 

Image processing in a broad sense deals with the manipulation of data which are 
inherently two-dimensional in nature. The purpose of image processing is to aid 
man to extract information from images. This implies that man is usually not 
capable of extracting all of the information in an image. The image may be 
blurred by some defect such as defocus or image motion. In addition, images 
always contain noise due to sensor characteristics, and, in some cases the prin- 
cipal degradation may be noise alone. 

The human visual system is a remarkable optical processing system with some 
very unique capabilities. It has, however, also limitations. The need for image 
processing is based upon the assumption that the human visual system is not 
necessarily efficient at performing all tasks related to the extraction of informa- 
tion from an image. The validity of this assumption has been supported by ex- 
periments which have shown that for a range of image defects and noise levels 
the human visual system is not an efficient information extraction system [ 1] . 

Image processing must always use some kind of a priori knowledge. Without 
a priori knowledge about the characteristics of object space and the imaging 
system there would be no basis for judging whether a picture is a good represen- 
tation of an object and therefore there is a reason to process the image. Thus, 
some form of a priori knowledge must be applied to a degraded image in order to 
extract information from it. One kind of a priori knowledge is concerned with 
intelligence information. Processing done on a degraded image may be different 
if the image is known to come from a certain source rather than being unknown. 
For example, different processing methods may be applied to high resolution 
images of the earth versus images of distant astronomical objects or X-ray pic- 
tures of the human body. 

Another kind of a priori knowledge of great importance to image processing is 
concerned with the physical process of the formation of an image. This includes 
knowledge of object characteristics, transmission medium characteristics, 
properties of sensor and systems which recorded the image and possibly the 
characteristics of the scanner used to scan a photograph. For example, the cor- 
rection of photometric and geometric distortions which occur in imaging with a 
vidicon camera requires knowledge of the characteristics of the vidicon tube. 

All this information is used to reduce the number of variables involved in pro- 
cessing. In fact, one kind of image processing which is often referred to as a 
priori processing deals with the clever design of imaging systems to minimize 
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degradations. This discussion, however, will concentrate on processing of given 
images without having control over how the images are formed. 

The description and processing of images can be facilitated when the image 
forming processes have some kind of mathematical structure upon which a 
characterization can be based. This will help in the development of a theory 
of image processing since portions of this field as yet remain more an art rather 
than a science. As with any emerging scientific field, heuristic techniques play 
a major role in the solution of many problems. It is not the purpose of this paper 
to present heuristic approaches which would often provide encouraging results 
for a given problem, but to present theoretical backgrounds which could possibly 
provide a common basis of approach to a class of image processing problems. 

For example, the basis for most of the design technology in the field of signal 
processing is the theory of linear systems. That linear system theory is as ad- 
vanced as it is, stems from the fact that the defining properties of linear sys- 
tems guarantee that they can be analyzed. The analysis, based on the principle 
of superposition leads directly to the concepts of scanning, sampling, filtering, 
convolution, stochastic estimation, etc. Equally important is the idea that the 
mathematical structure of the information being processed be compatible with 
the structure of the physical processes to which it is exposed. 

In image processing where electrical technology and optics are a dominating 
influence, linear models have been traditionally used. This is natural since 
image processing is based on those branches of classical physics which employ 
linear mathematics as their foundation namely electric measurements, elec- 
tronics, signal theory and communication theory. In optical image processing, 
the laws of image formation and degradation are derived from linear diffraction 
theory. In the section on image formation it will be shown that the structure of 
linear systems is compatible with the structure of a large class of images 
themselves. 

For the purpose of this discussion, a system is mathematically defined to be a 
mapping of a set of input functions into a set of output functions. For imaging 
systems, the inputs and outputs can be real- valued functions or complex-valued 
functions of a two-dimensional independent spatial variable (x,y). The systems 
considered are characterized by many-one mappings. A convenient representa- 
tion of a system is a mathematical operator, S { } , which operates on input 
functions to produce output functions. Thus, if the function f (x 1>yi ) represents 
the input to a system and g (x 9 , yr> ) represents the corresponding output, then by 
the definition of S { } , the two functions are related through 


g(x 2 ,y 2 ) = 5{f(x ltyi )} 


( 1 ) 
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The discussion of this paper will be restricted to the class of linear systems. A 
system is said to be linear if the following superposition property is obeyed for 
all input functions s and t and all complex constants a and b: 

S{as(x liyi )+bt(x 1>yi )} = aS{s(x 1 ,y 1 )} +b5{t(x ltyi )} (2) 


The great advantage afforded by linearity is the ability to express the response 
of the system to an arbitrary input in terms of the responses to certain ele- 
mentary functions into which the input has been decomposed. Each of these 
produces a known response and by virtue of linearity the total response can be 
found as a corresponding linear combination of the responses to the elementary 
stimuli. 

A simple and convenient decomposition of the input is offered by a property of the 
5 function which states that 


f(x,y) 


“ £»y “ 


(3) 


This equation maybe regarded as expressing f as a linear combination of weighted 
and displaced 8 functions. The elementary functions of the decomposition are the 
8 functions; they may physically represent an idealized point source of light. A 
possible definition of the two-dimensional 8 function is 

8 (x,y) = Lim N 2 e -N2 < x2+ y 2 ) (4) 


To find the response of a system to the input f, substitute (3) in {!): 


g( x 2-y2) 


- S 


in f(£tV) s (x t “ £,y 1 - 7?)d<f dr] 


} 


(5) 


Regarding the number f (£, rj) as simply a weighting factor applied to the ele- 
mentary function 8 (x - y t -17), the property (2) is invoked for linear systems 
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to allow S { } to operate on the individual elementary functions . This yields 


g(* 2 ’y 2 ) 


= J" f U£.V)& {»(* 1~ £ > y l ~ v)}dg dr] 


( 6 ) 


Now let the symbol h(x 2 ,y 2 ; £, 17 ) denote the response of the system at point 
(x 2 ,y 2 ) of the output space to a S function input at coordinates (£, 77) of the input 
space. The function h is called the impulse response of the system. The sys- 
tem input and output can now be related by the simple equation 


g(x,y) = f f f(£,77)h(x,y; £,7j)d£ d r) (8) 


This fundamental expression demonstrates the important fact that a linear sys- 
tem is completely characterized by its response to unit impulses. For the case 
of a linear imaging system, this result has the interesting physical interpreta- 
tion that the effects of imaging elements can be fully described by the images 
of point sources located throughout the object field [ 2] . 

The techniques of image processing find applications in many areas, notably: 
image restoration, image enhancement, object extraction, pictorial pattern 
recognition and the efficient encoding of pictures for transmission and storage. 
This paper will only discuss the various methods for image restoration, image 
enhancement and object extraction. The common questions underlying these 
areas are: 

1) How can images be represented and described? 

2) How are images formed? 

3) What mathematical operations can be used on images ? 

4) How can these mathematical operations be implemented? 

5) How can image quality be evaluated? 

6) What is the effect of noise on images? 

In the next chapter these questions will be elaborated upon. 
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2. IMAGE PROCESSING CONCEPTS 


2.1 IMAGE REPRESENTATION AND DESCRIPTION 

Representation is an important question in the transmission, storage and process- 
ing of any information. A physical image, as a carrier of information, is repre- 
sented in the form of light energy. This natural representation can be used by 
creating a signal proportional to the intensity of that energy. This is common in 
television and digital image processing whereas the photographical process does 
not use the representation by light intensity. 

Mathematically an image can be described by a function of two real spatial vari- 
ables. The value of such a picture function f (x, y) at a given point is called grey 
value, hi a black and white picture this value is determined by one parameter, 
the light intensity at the given point. Color pictures are described by several 
parameters. Any physical picture is of finite extent, hence, one can suppose that 
any picture function is zero outside a rectangle R of prespecified size. Since the 
amount of light reaching the observer from a physical picture is finite and non- 
negative one can assume that there is an upper bound M to the possible bright- 
ness of any physical picture. Thus, any picture function is bounded and nonnega- 
tive and will be described by 


0 < f (x, y) < M 


for all (x,y) e R 


E - {(x,y) 1 0 £ x< X M , 0< y < y N } 


( 9 ) 


The raster- scan operation of scanning instruments and image digitizers imposes 
a sequential row structure on images. The orientation of the coordinate system 
used in this paper is shown in Figure 1 where the x-axis is in the direction of 
increasing line numbers. 

For color pictures the definition of a picture function must be extended to in- 
clude the color information. This can be done by taking advantage of the fact 
that the space of all possible colors can be spanned by a three-dimensional 
vector space. This fact, which rests on physiological rather than physical evi- 
dence, allows to define a picture function as a vector-valued function f (x,y) 

= (f L (x,y), f 2 (x,y), f 3 (x, y)) where f 1 , f 2 , f 3 give the coordinates of the color 
of the picture at the point (x,y). Since the parametrization of the color space 
is not unique, the basis vectors must be specified to define a picture function 
completely. One basis is the set of the three primary colors, red, green and 
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Figure 1. Image Domain 


blue. Physically, using this basis corresponds to taking three scalar pictures 
of a single scene using, successively, red, green and blue filters. Consequently, 
the image is represented by three intensity matrices f R (x, y) , f Q (x, y) and f fi (x, y), 
where the subscripts denote red, green and blue primary components. 

Normalized color coordinates of a picture point can be defined by 


f R 

f R + 

f G + f B 


^G 

= V * 7 

f G + f B 




These numbers are independent of x and y and are called chromaticity coordi- 
nates or trichromatic coefficients. They sum to unity and two of them can be 
taken to define the normalized color coordinates at a point in the picture. 
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Figure 2 shows a representation of the 2-dimensional normalized color space 
for f r and f . Normalizing colors amounts to changing the basis of the 3 -dimen- 
sional color space from the original red, green and blue coordinates to normalized 
red, normalized green and total intensity. 



Figure 2. Normalized Color Space 


The amount of information that a color picture can convey to the observer is 
much greater than from a black and white picture. Experiments have indicated 
that although the human eye can differentiate only two or three dozen brightness 
levels it is able to separate thousands of various colors. 


2.2 IMAGE FORMATION 

Images are formed of light reflected from objects. The output of an image form- 
ing system is always degraded to some extent. Blurring is caused by the dif- 
fraction of light through a finite aperture, by aberrations of the system and by 
motions of object or imaging systems. Therefore, a practical image will only 
approximate the original image given in (9). The information content of the 
recorded image is limited by the resolution of the imaging system and by the 
presence of noise. The degradation of the original image is thus determined by 
the transfer characteristics of the imaging system. Because most image form- 
ing methods involve linear mechanisms, a practical image can be regarded as 
an additive superposition of points in the original image. This fact is expressed 
in (8) where g(x,y) represents the image intensity recorded by the imaging sys- 
tem, f (£, rj) is the original object intensity function and h(x,y; £ , rj) is the re- 
sponse in the image coordinates (x,y) to a unit impulse at (£, r\) in the object co- 
ordinates. The impulse response h(x,y; f , 77) is also called point-spread function, 
obviously it must be nonnegative. The relation (8) holds when the illumination is 
incoherent and the input and output functions represent intensity distributions [ 2] . 
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Generally, the response h(x,y; 77) in the image space varies with the position 
(£, v) of the input impulse and is called space-variant or shift- variant oomt- 
spread function. If h is a function only of the differences in the coordinates, 
then h(x - y - v) characterizes a shift- invariant or space- invariant system 
and the superposition integral (8) becomes a convolution integral (10). 


gO, y) 



f (£, V) h(x ~ <f, 


y - T?) d^ d?7 


(10) 


In any practical imaging system, the observed image g (x, y) will always be cor- 
rupted by noise of some kind. For example, if g (x,y) has been recorded on 
film, there is grain noise or when g(x,y) is sampled for digital processing, 
there is quantization noise. Thus, beside the noise free problem (8) there is 
the corresponding problem (11) when noise is added to the system output. 


g(x, y) 



f (£. i?) h (x - 


y “ V) d£ dr) + n( x, y) 


( 11 ) 


The assumption that noise is additive is subject to criticism. Some noise sources 
like granularity in photographic recording can be more accurately modeled as 
multiplicative [31 . However, the additive assumption is common to almost all 
work in image processing because it makes the problem mathematically tract- 
able. Some cases in which multiplicative noise can be handled are discussed 
in [4] . 

Degradations due to motion blurring and geometrical distortion may also limit 
the resolution in the recorded image. Examples are images taken from aircraft 
and spacecraft. Image formation in this case can also be modeled by the linear 
space-variant system (8). These systems can be analyzed by decomposition 
into geometric coordinate distortions and a space- invariant operation [5,6]. 

For practical systems the point spread function h (x, y) decreases to zero or to 
negligible values for large |x| and |y |. Therefore, it can be assumed that 
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h(x,y) = 0 for | x| > x L , | y | > y L and (10) can be written 


r x+x L r y + y L 

g(x, y) ' I f (£> V) h(x - £, y - Tj) d7] (10a) 

Jx-x L J y-y L 


Considering the finite extent of the picture function f (x,y), the convolution can 
be computed for x L < x < x M - x L , y L < y < y N - y L . 


2.3 MATHEMATICAL OPERATIONS ON IMAGES 

Image processing involves the transformation of an image from one form into 
another. Two distinct kinds of processing are generally possible. One uses a 
form of transformation which results in a new image being different from the 
original in some desirable way. The result of the other kind is not an image 
but may take the form of a decision, description or a parameterization. 

The first kind of image processing can be thought of as image coding, image 
restoration and image enhancement. It involves the transformation of an image 
by restoration operators and enhancement operators. Restoration operators 
compensate for image degradation and permit the reconstruction of the original 
image. Degradation and restoration are thus inverse operations. Since restora- 
tion attempts to compensate for distortions caused in part by linear mechanisms, 
linear processing methods are used extensively. Enhancement operators change 
the image further rather than restore it in order to emphasize certain features 
for the human observer. This selective accentuation and suppression reduces 
the information content of an image and it is important to decide what to retain. 
The mathematical theory of integral transformations is a suitable framework 
for describing many aspects of restoration, enhancement and object detection. 

The second kind of image processing deals with structural operators or feature 
extractors and syntactic schemata. It is .a mapping from pictures to descriptions 
of pictures. In examining a picture one is often interested only in extracting 
from it a description of what it depicts. This is the problem of pictorial pattern 
recognition and scene analysis. The desired description may be merely a classi- 
fication of the picture into one of a small set of prespecified classes. The de- 
scription may also involve properties of, and relationships among, objects that 
appear in the picture. To obtain such a description, it is usually necessary to 
explicitly locate the objects in the picture and to measure their properties and 
interrelationships. Picture descriptions in terms of objects, properties and 
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relationships can be expressed using special picture languages [7] . Mathemati- 
cal methods useful for this class of image processing include statistics, graph 
theory, theory of formal languages and automaton theory. 

The integral transform F(u,v) of a function f (x,y) is defined by 


F(u, v) “ J J f(x,y) K(u,v, x,y) dx dy 


( 12 ) 


where K (u,v,x,y) is called the kernel of the transformation. The limits of in- 
tegration may be finite or infinite. If the kernel K and the transform F are 
known, then the process of determining a solution f of the integral equation is 
called inversion. The properties of the original function f and the kernel K over 
the regions of integration determine the existence of the integral transform F 
and the existence of an inversion formulae giving f as an integral transform of F: 


f(*.y) 



F(u, v) r 


(u, V, x, y) du dv 


(13) 


Mathematical conditions can be found in [16] . If this formalism is used to de- 
scribe two-dimensional image formation, then f represents the original image, 
the kernel represents the action of the imaging system and the transform gives 
the resultant image. If the image forming process can be modeled by a linear, 
S p aee - invariant system, the transformation can be described by a linear homo- 
geneous operator. In this case a large body of mathematical theory becomes 
available to image processing. 

Since any homogeneous linear operator is equivalent to a convolution integral, 
those linear integral transforms for which there are convolution and inversion 
theorems are most useful in image processing. Convolution and inversion 
theorems are known for several kernels, such as the Fourier, Hankel, Laplace, 
Mellin, Hadamard and Haar-Walsh transforms. Another useful property of a 
kernel is separability. A kernel is said to be separable if it can be written as 
a product of two functions 


K(u, v, x, y) - K x (x, u) * K 2 (y, v) 


(14) 
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Separability allows two-dimensional transforms to be reduced to a sequence of 
one dimensional transforms which facilitates the computation considerably. The 
transformation kernel is called separabel symmetric if 


K(u, v, x, y) - Kj (x, u) * K 1 (y, v) 


(15) 


Since the statistical intensity variations of most images are nearly the same in 
the vertical and horizontal directions only separable symmetric kernels need to 
be considered [17] . 

The image forming model (10) can be mathematically described by an operator S. 
If 3^ denotes a translation operator 


[^,77 C f )] ( x -y) - f(x-£, y-7}) (16) 


then S satisfies the following criteria: 

a) S is linear, i.e. it satisfies the superposition property (2) 

b) S is homogeneous or position invariant, i.e. it commutes with every 
translation operator 3^ 

Spf.iO)) = 3V.,(8(0) ( 17 ) 

Equation (10) can also be written as 


g(x, y) - f (x, y) * h(x, y) 


(18) 


where * denotes the two-dimensional convolution operation. 

Two useful operations in image processing which can be derived from convolu- 
tion are crosscorrelation and autocorrelation. They can be used to measure 
the goodness of match of two patterns. Let h denote the result of reflecting h in 
the origin, h (x , y) = h (- x, - y) for all x, y. The convolution f * h is called the 
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crosscorrelation of f and h and is denoted by f x h. Thus, 


f x h 


-00 f 00 

h(f,i7) f(£ + *> 

J -OO 


V + y) d£ d?7 


( 19 ) 


The cross correlation of f with itself, f x f , is called the autocorrelation of f [7] . 
Symmetry and antisymmetry properties of convolution and correlations are useful 
in image analysis. 

Among the various possible transforms the Fourier transform has been the most 
useful in enhancement and object detection for several reasons. It can be readily 
interpreted in terms of spatial frequency, has a separabel kernel and can be ef- 
feciently implemented by digital as well as electro -optical and optical means 
[8,9], 

2.3.1 Fourier Analysis in Two Dimensions 

This section will summarize definition, some existence conditions and properties 
of the two-dimensional Fourier-transform. No attempt at great mathematical 
rigor is made, but rather an operational approach will be adopted. 

The Fourier-transform of a function g of two independent variables will be repre- 
sented by y { g } and is defined by 


^{g} - G(u, v) 



00 

g(x, y) 

■00 


e “ 2^i(ux+vy ) dx jjy 


( 20 ) 


The Fourier-transform is a complex-valued function of two independent variables, 
u and v, which are referred to as spatial frequencies. Similarly, the inverse 
Fourier-transform of a function G(u,v) will be represented by? -1 (G) and is 
defined as 


r l ( g> 


D oo 

G(u, v) e 27ri(ux+vy ) du dv 

■OO 


(21) 
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Before discussing the properties of the Fourier- transform and its inverse, the 
question of the existence of the transform pair (20) and (21) shall be briefly 
elaborated upon. For certain functions, these integrals may not exist in the 
usual mathematical sense. The following set of sufficient conditions is com- 
monly quoted [10] ; 

1. g must be absolutely integrable over the entire picture plane. 

2. Any discontinuities in g are finite. 

3. g must have only a finite number of discontinuities and a finite number 
of maxima and minima in any finite rectangle. 

Although some functions (e.g., periodic functions) do not have a Fourier- 
transform, as may be verified by reference to the conditions above, it is pos- 
sible to find a meaningful transform, provided the functions can be defined as 
the limit of a sequence of functions that are transformable. By transforming 
each member function of the defining sequence, a corresponding sequence of 
transforms is generated and the limit of this sequence is called the generalized 
Fourier- transform of the original function [10] . To illustrate the calculation 
of a generalized transform, consider the Dirac 8 function which violates ex- 
istence condition 2. Each member function of the defining sequence (4) satis- 
fies the existence requirements and has a Fourier-transform given by 


J {n 2 e ~N 2 (* 2+ y 2 )J. = e ~(u 2 + v 2 )/N 2 


Accordingly the generalized Fourier-transform of §(x,y) is 


J {5(x t y)j - Lim e'^ u + v ■ )/N - 1 (22) 

The question of the existence of a Fourier-transform will be of no further con- 
cern, for practical difficulties are rarely if ever caused by problems involving 
the existence of the defining integrals. 

Some properties of the transform that will be useful are [2, 10] 

Linearity: 

J {a fj (x, y) + b f 2 (x, y)] = aFj (u, v) + bF 2 (u, v) (23) 
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Scaling: 


f {f(ax, by)} 



(24) 


Shifting: 


5 {f(x - a, y - b)} = F (u, v) e -i ^ ua+vb ^ 


(25) 


That is, the Fourier-transform is not a shift- invariant operation. Translation 
of a function in the space domain introduces a linear phase shift in the frequency 
domain. 

Convolution: 


y) * h ( x , y )} - F (u, v) H(u, v) (26) 


The convolution of two functions in the space domain is equal to the inverse 
Fourier-transform of the product of their Fourier-transforms. 

Autocorrelation: 


{££ 


f (£, V) f* (£ ’ x, 7] - y) d£ d J] 


|F(u f v)| 


(27) 


The Fourier-transform of the autocorrelation function is the power spectrum 
of a signal, 

Parseval's Theorem: 



oo 

f (x, y) g* (x, y) dx dy 

OO 



OO 

F(u, v) G* (u 

oo 


, v) du dv 


(28) 
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This theorem is generally interpretable as a statement of conservation of energy. 


Gradients: 


y{v 2 f(x, y)} - - (u 2 + V 2 ) F(u, v) 


(29) 


Inversion: 


T 


% {"(x, y)} 


f ( x > y) 


(30) 


Conjugate symmetry: 
If f(x,y) is real, then 


5 {f(x,y)} - F(u, v) - F*(“u, - v) 


(31) 


The two-dimensional Fourier-transform is just one way of decomposing a func- 
tion g(x,y) into a set of coefficients of orthogonal functions. Any orthogonal 
transformation will provide a similar decomposition. The advantage of the 
Fourier-transform is that the orthogonal functions are trigonometric functions 
which are the eigenvector solutions of linear space-invariant systems. 

In the following, the intuitive consequences of the definition of the Fourier- 
transform pair shall be considered. The transform of a picture function (21) 
defines the weighting coefficients of its expansion in a sum of complex ex- 
ponentials. To understand the transform, a feeling for the pictorial appearance 
of the exponential 


e 277-i(ux+vy) 


for a given value of u and v is needed. The locus of points in the x - y plane 
for which this complex function is real (and has thus zero phase) is given by 


u n 

y-~~x+— n integer 
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which is a set of parallel lines whose slope and spacing is determined by u and 
v. There are, thus, two geometrical aspects associated with a given point (u, v) 
in the spatial frequency plane: an orientation @ and a spacing L. 

It is interesting to visualize the implications of the Fourier-transform F(u,v) of 
a picture f (x,y) being nonzero only at the spatial frequencies (u,v). Therefore, 
and by consequence of (31) only two terms will contribute to the decomposition 
(21). The picture will look like 


F(u, v) e 27Ti ( ux+vy > + F(-u, - v) e 27ri < ux vy > 


which is real and is like a sinusoidally undulating surface whose crests are a 
set of parallel lines. Hence, each symmetric pair (u,v) and (-u, - v) of spatial 
frequencies contributes to the generalized sum (21) one picture consisting of 
parallel stripes of sinusoidally varying intensity. The greater the magnitude 
of the transform is at (u,v), the more important is this contribution. 

Suppose now, a picture function f (x, y) consisting of vertically oriented dark and 
light stripes with abruptly changing transitions. The Fourier-transform then 
will be nonzero only in the u axis, since the stripes are vertical in the picture. 

In general, edges in a picture introduce spatial frequencies along a line in the 
complex frequency plane orthogonal to the edge. Intuitively, high spatial fre- 
quencies correspond to sharp edges, low spatial frequencies to regions of ap- 
proximately uniform grey level. The orientation of a spatial frequency cor- 
responds to the orientation of an edge in the picture. 

2.3.2 Image Transformations 

The transformations mentioned above belong to the class of unitary transforma- 
tions. Until recently few systems took advantage of the power of the theory of 
unitary transformations. With the development of fast algorithms which vastly 
decrease the computational requirements, great interest has been generated in 
digital spectral decomposition [8,9] . The theory of unitary transformations 
can provide powerful processing methods. One of the most appealing justifica- 
tions for the use of unitary transforms lies in the properties of its eigenvalues 
and eigenvectors which define optimal solutions to various systems in the theory 
of conventional as well as stochastic systems. 

In fact, consider the linear shift- invariant system (10) with impulse response 
h and denoted by the operator £. If the input function is of the form 
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f(x,y) = e i < ux+vy) , the corresponding response is given by 


a oo 

h(f, rj) e‘ i < u # +v, » dg-dT] 

oo 


The integral is called the system transfer function 


H(u, v) 


n o© 

h(^, 17) e -1 ^ u, ^ +v?7 ^ d£ drj 

OO 


(32) 


H(u,v) is the Fourier- transform of h(^f, tj) and 


£ { e i(ux+vy>} = H(U( v) e i(uxtvy) 


(33) 


This shows that e i(« x +vy) [ S an eigenfunction of £ for any u and v and H(u, v) are 
the eigenvalues of £. 

In the area of stochastic processes the decomposition of a nonperiodic random 
process into a series of orthogonal functions is given by the Karhunen-Lo6ve 
theorem [18] . Each element of a visual image can be considered as a random 
variable which is proportional to the intensity of the scene at the element loca- 
tion. The entire pattern can then be considered as a random process and the 
pattern statistics can be described by the covariance matrix C determined over 
all patterns. If C is known, the Karhunen-LoSve transformation provides max- 
imum compression of the image information resulting in the minimum set of 
features necessary to obtain class separation. 

The transformation is found by determining the eigenvectors of C defined by 


C<D = 


The set of eigenvectors $ determines a new pattern space where each element 
is uncorrelated. The picture information is generally distributed over many 
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elements in the original space and only over a few in the transformed space if 
the correlation between elements is high. 

While the eigenvector solution to specific systems provides optimum orthogonal 
decomposition, calculation of eigenvalues and eigenvectors is a very difficult if 
not numerically impossible task. Consequently suboptimum decompositions 
are desirable if they can be efficiently generated by digital processes. In sec- 
tion 2.4 the discrete Fourier- transform as the most important example for the 
implementation of a fast decomposition algorithm will be discussed. 


2.4 IMPLEMENTATION OF IMAGE PROCESSING OPERATIONS 

There are basically two ways to implement image processing operations, either 
by digital or by optical means. The first way is referred to as digital image 
processing, the second as optical or electron- optical image processing [46,47]. 
The question which arises is what types of operations can better be done on a 
digital computer than by optics and vice versa. In this section, digital computer 
and coherent image processing techniques are briefly reviewed and compared 
to indicate their relative merits and faults. 

The main uses of coherent optical systems in image processing have been Fourier 
transformation and linear filtering. These operations are possible because of the 
Fourier transforming property of a lens [2] . If a film transparency with ampli- 
tude transmission g (x,y) is placed in the front focal plane of a lens and is il- 
luminated with collimated monochromatic light, then the amplitude of the light in 
the back focal plane of the lens will be the Fourier transform of g (x, y). Linear 
filtering can be performed by putting a transparency with amplitude transmission 
H (u, v) in the back focal plane of the lens and using a second lens whose front 
focal plane coincides with the back focal plane of the first lens. Then at the back 
focal plane of this second lens, the light amplitude will be the convolution of the 
original image with the inverse Fourier transform of the filter amplitude 
transmission. 

Digital image processing uses a digital computer to implement the various oper- 
ations on images. This implies that the image has to be in digital form. Digital 
images are obtained either directly as from some imaging systems aboard space- 
crafts or they are digitized by a film scanner. This is the usual case for bio- 
medical images. The digital computer is used because of its great flexibility in 
the implementation of image processing operations, and its accuracy. However, 
conventional digital computers are sequential, they can perform only one (at 
most a few) arithmetical operation(s) at a time. Thus, image processing with 
digital computers is usually very time consuming. The convolution of a picture 
of size N with a mask of size M, for example, takes N 2 ♦ M 2 multiplications 
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which easily becomes prohibitive. The development of fast algorithms, e.g. 
the fast Fourier- transform, improved the situation considerably. Great savings 
could be realized in the processing time required if it were possible to perform 
the identical operations for each picture point in parallel. For this purpose 
one could use a parallel computer having many processing units that operate 
under central control like the ILLIAC IV [11, 12] . A problem is to adopt the 
algorithms for parallel processing [13] . 

A more specifically picture-oriented approach to parallel computation involves 
special-purpose digital hardware that can perform operations simultaneously 
on each element of an arra}' of numbers [14] . If an entire array can be shifted 
in any of the four principal directions and two arrays can simultaneously be 
added or multiplied elementwise, digital pictures could be convolved very rapidly. 
Only (N + M - l) 2 shifts and multiplications would be required instead of N 2 * M 2 
multiplications bj' the sequential method. 

The sequential digital computer with its ability to be programmed for any pro- 
cessing concept is rightly a very useful experimental tool. Even on such a com- 
puter it is usually possible to perform simple, logical and shifting operations 
simultaneously on each binary digit of a computer word. The PAX II picture 
processing system uses this approach [15] . 

2.4.1 Comparison of Coherent Optical and Digital Ima ge Processing 

1. Flexibility 

Coherent optical systems are essentially limited to linear operations on the 
amplitude transmission variations of a film transparency. On the other hand, 
digital computers can be used to do linear operations on amplitude, intensity 
or density. More importantly, digital computers can handle nonlinear opera- 
tions, removal of noise, corrections for deficiencies of the imaging system and 
can produce numerical output. Thus, as to flexibility, the digital computer is 
superior to optical systems, it will probably always be used for preprocessing 
of images. 

2. Capacity and Speed 

In a coherent optical system, the film is used as storage, resulting in an enor- 
mous capacity. The data on the film can be operated on in parallel, so that the 
speed is limited, in principle, only by the speed of light. A digital computer 
has a limited memory, but a great amount of auxiliary memory can be attached 
to it. Films can also serve as storage if a film scanner is available. However, 
conventional digital computers operate sequentially on the data, they can per- 
form only one (at most a few) operation! s) at a time. Therefore, operation on 


19 



large images takes a long time to bring the data into the central processor and 
to process them. 

3. Accuracy 

In a coherent optical system there are various sources of errors, such as: im- 
perfect optical components, film grain noise and nonlinearity, thickness varia- 
tions of film emulsions, errors in spatial filters and imperfect alignment of the 
optical system. These errors are difficult to control and an accuracy of only 
3 to 5 percent can be expected in a coherent optical system. In digital process- 
ing, there are inherent errors due to sampling and quantization. These errors, 
however, can be made arbitrarily small by increasing sampling rate and quantiza- 
tion levels. In practice, the accuracy of digital computer image processing is 
limited by the resolution of the imaging system or film scanner. 


In summary, the main advantages of a coherent optical system are its storage 
capacity and processing speed and the main advantages of a digital computer 
are its flexibility and accuracy. Coherent optical systems are suitable for per 
forming linear operations such as Fourier transforms and linear filtering on 
large images. When accuracy or nonlinear operations are required, the digital 
computer has to be used. A future image processing laboratory could benefit 
from a combined use of coherent optical systems and digital computers. Such 
an im ag e processing center would include a general-purpose digital computer 
and various special-purpose digital and optical processors. It is also extremely 
important that man- machine interaction be made flexible and convenient so that at 
each critical stage of the processing, the human operator can examine the results 
obtained and decide on what to do next. The data transfer between a digital com- 
puter and a coherent optical system is conceptually possible, however, much anal- 
ysis and experimental work need to be done. 

2.4.2 Digital Image Processing 

When pictures are processed by a digital computer, they are usually represented 
as discrete arrays of numbers, i.e., as matrices, rather than as functions. Any 
M by N matrix (g..) with real, nonnegative elements can be thought of as defining 
a digital picture function g. It can be shown that any picture function is indis- 
tinguishable from an M by N digital picture function for sufficiently large values 
of M and N. In digital image processing a picture can take on only a finite set 
of values, its grey values are quantized. It can also be shown that any picture 
function is indistinguishable from a quantized picture function, provided that 
sufficiently many levels are allowed [7] . This guaranties that the mathematical 
operations discussed in section 2.3 can be performed on digital pictures. 
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The question how finely a picture must be sampled in order to preserve all its 
information is answered by the sampling theorem. If the samples are taken suf- 
ficiently close to each other, the sampled data are an accurate representation of 
the original function in the sense that this function can be reconstructed with 
some accuracy by interpolation. For the class of bandlimited functions the re- 
construction can be accomplished exactly, providing that the interval between 
samples is not greater than a certain limit. A function g(x,y) is called band- 
limited if its Fourier- transform G(u,v) is zero whenever either |u | or |v| is 
greater than some number W. The sampling theorem applies to the class of 
bandlimited functions and states that the sampling interval be not greater than 
1/2W. 

In digital image processing linear operations such as convolution, transforma- 
tion, and filtering can be succinctly expressed in vector space notation. The 
advantages are a compact notation and the ability to use results derived for one- 
dimensional problems. 

A discrete representation of the convolution integral (10a) can be found by nu- 
merical approximation. The simplest formula would be the trapezoidal rule, 
which applied to (10a) gives 


m+2“ n + 2 

g (mAx, n Ay) ~ Ax Ay L L f (i Ax, j Ay) H [(m “ i) Ax, (n - j) Ay] 

„L _ L 
l-m j j-n j 

(34) 


L L L 

m 2’ 2 ***» M“2 ~ 1 



where x = x m /M, y = y M /N, L = 2x L /Ax = 2y L /Ay. Hence, the processed pic- 
ture is of smaller dimension than the original picture. Translation and including 
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all quadrature factors in h yields 


g(m» n) 


m+L“ 1 n+L~ 1 

£ f (i> j) ^( m ** i + L, n - j + L) 

i = m j = n 


( 35 ) 


m = 1, 2, , M' 

n = 1, 2, **■ , N' 


where M ' = M - L and N ' = N - L. 

Let now J denote a column vector of dimension M * N with the elements of the 
digital picture array f (i J) ordered columnwise. With a similar interpretation 
of g and n (35) can be written in vector form as 


g - BX + n 


(36) 


where B is an M' N' x MN matrix partitioned as 


B 



® 1,2 

b 1jL 

0 

0 

^ 2,2 

b 2iL 

B 2 , L +1 

0 

0 






0 b n ^ 

N - L + 


(37) 
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The M' x M submatrix B pq is given by 


/ h( K , p - q + L) 


0 


h( K- 1 , p - q + L ) • * • Ti( 1, p “ q + L) 0***0 
h ( K , p - q+L) 




pq 


0 


0 


0 


0 


h(L, p- q+L) • • « fr( 1 , p-q+L) 


( 38 ) 


for 


1 < p < N' 


p^q_<L+p~l 


The structure of B is given by 


B 


pq 


B 


p+ 1, q+ 1 


(39) 


and consequently, the rows of B are shifted versions of the first row. The con- 
volution operation in vector form requires MNL 2 operations if the zero multi- 
plications of B are avoided. The block diagram of the image forming process 
expressed in vector form is shown in Figure 3. The degradation is modeled by 


f 

IDEAL IMAGE 


9 

OBSERVED IMAGE 

IMAGING SYSTEM | 

SAMPLE PULSE 



Figure 3. Digital Image Formation 
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the matrix B, the observable vector g is of smaller dimension than the vector f 
representing the ideal image. Hence, the solution of (35) is generally not unique. 

2.4.3 Discrete Convolution Theorem 

The discrete convolution theorem is of great importance in digital filtering. It 
makes possible the convolution of two digital images by the product of their dis- 
crete Fourier transforms. Hence, the discrete convolution theorem is the basis 
for the application of the fast Fourier transform (FFT). The important aspect 
to be noted in connection with the discrete convolution theorem is that while the 
convolution takes a number of operations proportional to M x N x L 2 by the direct 
method, the FFT method requires a number of operations proportional to MN (2 log M 
+ 21ogN + l). Operation means, throughout this report, amultiplicationfollowedby 
an addition. The Fourier transform method consists in transforming the func- 
tions f and h, multiplying the transforms and transforming the product back. 

For the discrete convolution theorem to hold, it is necessary that the discrete 
functions f and h be periodic. These periodic functions are defined in the man- 
ner of Cooley [34] : 


f p( x >y) 


(B 00 

£ £ f (x + j P, y + kP) 

j =-00 k=-co 


(40) 


V*. y > 


CO CO 

X! 2 h(x + j P, y + kP) 

j=-“ k = -a> 


(41) 


g p ( x > y) 


00 00 

£ £ g(x + jP, y + kP) 

j=~® k = - co 


(42) 


The subscript p on a function denotes the periodic function formed by the super- 
position of the nonperiodic function shifted by all multiples of a fundamental 
period. The period P is the smallest power of 2 greater than or equal to Q, 
where Q = max(M,N). Since the original functions are zero outside their re- 
gions of definition, a choice of P suitably large will lead to no overlap of the 
functions when the shifted replications are summed. Thus, the M x N image 
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array f (m,n) and the L x L impulse response array h(k,l) are embedded in the 
upper left corner of P x P arrays of zeros. 

The circular convolution of the periodic functions is defined as 


g p O> n ) 


L 

: — i 


Y f p( L ’ i) ^ p O “ i + 1- n - j + 1) 

i=l 


(43) 


m - 1, 2, , P 

n = 1, 2, ••• , P 


The circular convolution is equivalent to an ordinary convolution for a proper 
choice of P because the original functions were assumed to be identically zero 
beyond their regions of definition. The periodic replications contain these zero 
values and cancel the periodic "wrap-around". 

Equation (43) can be written in vector form as 


& 


= B f 

p — p 


(44) 


where g p is a column vector of size P 2 whose elements are g (m, n), f is a 
column vector of dimension P 2 and B p is a matrix of size P 2 x P 2 which can be 
partitioned into P x p submatrices B . 
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Each row and column of B p contains L non- zero submatrices. The structure of 
B p is given by 


B 


s,t 


B 


s+ l,t + 1 


(46) 


with 


s - s mod P 
t - t mod P 


It should be noted that indices are to be interpreted modulo P. Therefore, a 
row can be obtained by a circular right shift of the row above it. Furthermore, 
the first row is a circular right shift of the last row. Such a matrix is called 
a circulant matrix. The circular behavior is a direct consequence of the fact 
that h p (i,j) is a periodic function. 

It can be shown that the Fourier transform basis vectors are eigenvectors of the 
circulant matrix B p [35] . Now (44) can be written in terms of the diagonal repre- 
sentation of B p 


ip 


= wD h r 


1 f 

— p 


(47) 


where D h is the diagonal matrix of eigenvalues of B p and W is a matrix whose 
columns are the eigenvectors of B p . By rearranging this equation into 

r'| p = D h r 1 f p (48) 

it can be seen that W' 1 g and W' 1 f are the Fourier transforms of g and f re- 
spectively and the diagonal matrix D h is the Fourier transform of h p (m,n). 
Consequently (48) is a term-by-term product of Fourier transform and is the 
frequency domain representation of the circular convolution (44). For a proper 
choice of P the circular convolution is equivalent to the aperiodic convolution. 
This proves that the discrete convolution can be computed by the fast Fourier 
transform algorithm. 
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2.4.4 Discrete Fourier Transform 


The discrete Fourier transform of a periodic digital picture function f p (40) is 
given by 


F p (m.n) 


. M-l N-l 

h L L 


j=0 k=0 


f P C j • k ) 


~277i riiSL + — 1 

^ 1 [m nJ 


m - 0, 1, •** , M ~ 1 
n - 0, 1, ••• , N - 1 


(49) 


The discrete inverse transform is given by 


f p (i’ k > 


_1 

M 


M-l 


L 

m-f 


N-l 


L 

^ — r\ 


F p ( m * n ) 


2tt'l 

e 


'j_m 
. M 


+ fell 
N J 


(50) 


j -0,1, * ♦ • , M “ 1 
k-0, l t **• , N - 1 


For this Fourier series representation to be valid the image must be considered 
periodic horizontally and vertically as shown in Figure 4. The right hand side 
and the left hand side, as well as the top and the bottom of the image are adjacent. 
Spatial frequencies along the coordinate axes of the transform plane arise from 
these transitions. Although these are false spatial frequencies from the stand- 
point of being necessary for representing the image within the image boundary, 
they do not impair reconstruction. On the contrary, these spatial frequencies 
are required to reconstruct the shape boundaries of the image. 


2.5 EVALUATION OF IMAGE QUALITY 

Image quality is an important concern in image processing. Its meaning can be 
clarified by understanding what is to be measured when dealing with images and 
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Figure 4. Fourier Series Representation of an Image 


by strengthening the bridge between the physical and visual aspects of many 
image processing issues. The availability of sophisticated digital methods sup 
ports the need for precision. There is, however, the realization that the lack 
of standards for reading images into and writing images out of digital form can 
bias the effectiveness of a process and can make uncertain the comparison of 
results obtained at different installations. 

Whenever a picture is converted from one form to another, e.g. , scanned 01 dis 
played, the question arises how faithfully the information contained in the input 
picture is conserved in the output picture. It is difficult to find subjective dis- 
tortion measures and a part of the problem stems from the fact that physical 
and subjective distortions are necessarily different. One of the difficulties in 
specifying image quality is that the intended use of the images is not well de- 
fined and it is unlikely that a wide variety of tasks can be covered by a single 
quality criterion. Consideration of the purpose for which an image was re- 
corded and of the interaction of images with the human observer will help to 
evaluate image quality. 

Where the goal is extraction of information and where the image is to be pro- 
cessed prior to viewing, the information content of the image is a true evaluation 
criterion [19] . Since any real imaging system will have limitations imposed by 
resolution and noise, there is a fundamental limit to the information contained 
in an image. The human visual system is incapable of efficient extraction of 
information which is contained in a degraded image. The improvement achieved 
by processing can be evaluated by comparing the ability of the human observer 
to extract information from the image before and after processing. This criterion 
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depends on the specific visual task which is to be performed. When the spe- 
cific task is not defined, a satisfactory evaluation of image quality is difficult 
to achieve. 

Some authors have indicated that a s ignal- to-noise ratio concept should be use- 
ful in the prediction of image quality. An attempt to unify image quality criteria 
for resolution and detection used statistical decision theory [20] . 


2.6 NOISE CHARACTERISTICS AND REMOVAL 

The practical limit to all quantitative and photo interpretive measurements on a 
properly encoded image is the presence of noise. Enhancement processes such 
as filtering to improve image resolution can sharpen features only at the ex- 
pense of over-all signal-to-noise ratio. Therefore, one of the most important 
initial steps in digital image processing is the suppression of noise, so that sub- 
sequent restoration, enhancement and extraction operations can be performed 
on maximum signal-to-noise ratio imagery to achieve optimal results. 

Many noise sources exist in imaging systems ranging from random, wideband 
and thermal noises to highly structured periodic noises. The precise separa- 
tion of any noise from the data of a single frame must be based on quantifiable 
characteristics of the noise signal that distinguish it uniquely from the other 
image components. In space it is generally not possible to obtain multiple 
imagery in order to use frame- averaging techniques. In most real situations 
only statistical information about the various image components is available 
and, thus, their separation is approximate. The essence of noise removal is to 
isolate and remove the identifiable and characterizable noise components as 
vigorously as possible, so as to do a minimum of damage to the actual image 
data. In most cases, the errors introduced to the real signal by the removal 
process, while small, vary from point to point and can only be measured if de- 
tailed knowledge about the scene being photographed is available. Certainly 
the efficacy of noise removal is data dependent. 

The main types of structured noise appearing in images are periodic, long-line 
and spike noises. Periodic noise arises from the coupling of periodic signals 
related to the raster- scan and data- s ampl ing drives into the imaging electronics. 
For typical spacecraft systems, these periodic noises exhibit phase coherence 
over times that are long compared to the frame time of the camera. For this 
reason, the periodic noise appears as a two-dimensional pattern exhibiting 
periodicity along the scan lines and perpendicular to them [49] . A useful method 
for characterizing this periodicity is in terms of a Fourier decomposition. A 
first order removal of periodic noise components can be achieved by filtering 
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in the two-dimensional Fourier domain. The SMIP system provides the pro- 
gram FOURIER 2 for computing the two-dimensional Fourier- transform of an 
image, the program FOURPIC to display the amplitude of the Fourier spectrum 
and the program FREQFILT for two-dimensional filtering with various filters 
in the frequency domain. 

Long-line or streak noise is produced by a variety of mechanisms such as gain 
variations, data outages, and tape recorder dropouts. This type of noise be- 
comes apparent as horizontal streaks especially after removal of periodic noise. 
It is characteristic for images from the ERTS-1 multispectral scanner. The 
characteristic that distinguishes streak noise from the actual scene is its cor- 
relation along the scan-line direction and the lack of correlation in the perpen- 
dicular direction. This distinction is not complete since linear features are 
present in some natural scenes and noise removal based on this characteristic 
may result in major damage to the true signal in regions that contain scene 
components resembling the noise. 

A technique to correct for streak noise is to compare the local average intensity 
value of lines adjacent and parallel to the streaks with the average value of the 
streak itself and to apply a gain factor to account for any differences [50] . A 
multiplicative rather than an additive correction is applied because the physical 
origin of the noise is multiplicative (magnetic tape dropouts). This correction 
is particularly data dependent in its effect and although providing a global im- 
provement it may introduce artifacts in the detail. The program ERTSFIX in 
the VICAR and SMIPS libraries performs streak corrections on ERTS-MSS 
images. 

Spike noise is caused by bit errors in data transmission or the occurrence of 
temporally sharp disturbances in the analog electronics. It produces isolated 
picture elements that deviate significantly from the surrounding data. Spike 
noise can be removed with a simple technique. Each picture element is ex- 
amined and if it is significantly above or below each of its neighbors, it is re- 
placed by the average neighboring intensity. Using the digital computer to 
isolate and remove various structured noise components from the raw images 
can considerably improve the signal to noise ratio. This preprocessing pro- 
duces already an enhancement that allows analysis of detail closer to the reso- 
lution limits of the imaging system. 
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3. IMAGE RESTORATION 


The extraction of features or objects from an image may not be possible if it is 
severely distorted. Restoration compensates for the distortions and attempts 
reconstruction of the original image. Degradations may be caused by the various 
factors discussed in section 2.2. In terms of the model for image formation ex- 
pressed by (11) the restoration task can be defined as follows: given g, utilize the 
a priori information about h, f and n to make a good estimate f (x,y) of f. The 
various restoration schemes differ from each other in the assumed a priori in- 
formation as well as in the criterion by which the goodness of the estimate is 
judged. 

Frequently, the a priori information is little more than a reasonable assumption. 
All that is usually known about the desired picture is that f (x,y) is nonnegative, 
bounded and zero outside some region. Likewise, a priori information about the 
noise is usually very meager and often it is assumed that n has a known constant 
spectral density. The impulse response h can be known, either from theory or 
calibration of the imaging system, or it may be unknown. It is assumed that the 
restored image is to be viewed by a human observer. This requires that the 
restoration should redisplay the information in the degraded image in such a way 
that it enables a human observer to identify the original objects with as much de- 
tail as possible. Because of the restrictions discussed in section 2.5 on image 
quality, no absolute criterion by which the goodness of the estimate is to be 
judged can be specified. Each restoration scheme assumes some intuitively 
reasonable criterion of goodness. This confesses the ignorance of the optimum 
tradeoff between resolution and noise. 

Restorative techniques can be classified into 1) inverse filtering, 2) optimal fil- 
tering, 3) constrained deconvolution, 4) restoration with correction tables and 
interpolating functions and 5) other methods. In the discussion of these methods 
it is assumed that the impulse response of the degrading system is known. 


3.1 INVERSE FILTERING 

Inverse filtering attempts perfect restoration without regard to noise. Fourier- 
transformation of the noise-free model (10) gives 


G(u, v) - H(u, v) • F(u, v) 


(51) 
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in the spatial frequency domain. Formally, the restored image f is obtained 
by reinverting 


?( x > y) 



( 52 ) 


provided H does not vanish at any point (u,v). The method fails if H has zeros 
at spatial frequencies within the range of interest. The second flaw is that H 
decreases rapidly for large values of u and v whereas noise has a fairly uniform 
spectral distribution. From (52) it is apparent that the restoration enhances 
high-frequency noise. Modifications have been suggested to overcome these 
drawbacks [19] , A simple and efficient method is to limit the amplitudes of H" 1 
at higher frequencies in order to limit enhancement of high-frequency noise or 
to replace H' 1 by zero in the range of (u, v) over which the noise is larger than 
the signal. 


3.2 OPTIMAL FILTERING 

A way to avoid the arbitrariness of the inverse filtering approach is to minimize 
the discrepancy between f and f. The measure of this discrepancy should ideally 
correspond to that of the human visual system. Because this is not known in 
detail an alternative is to find the optimum restoration for a simpler objectively 
defined criterion such as the minimum mean- square- error (MSE) criterion 
for which the optimum restoration can be computed. From the point of view of 
image restoration for a human observer minimizing the MSE is an inadequate 
criterion. It is well known that the eye demands much more faithful reproduc- 
tion of regions where the intensity changes rapidly than of regions with little 
change. It is also known that the sensitivity of the eye to a given error in in- 
tensity depends strongly on the intensity. The minimum MSE criterion, on the 
other hand, weighs an error independently of the intensity at which it occurs. 

It has been shown, however, that the method is capable of producing very good 
restorations [21,22]. 

In the following, a brief outline of the mathematical derivation of the optimum 
restoration for digital images will be given. Picture and noise are considered 
to be members of the random processes {f} and {n }, respectively. Given (36), 
a restoring matrix W has to be determined such that the estimate 


1 = Wg 


(53) 
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minimizes the error measure 


e = E {(1 - f ) T (i - f )} < 54 ) 

where E denotes the expected value. If it is assumed that f and n are uncorrelated 
stationary Gaussian random vectors with zero mean, the error measure becomes 

e = tr [(WB - I) K f (WB - I) T + WK n W T ] (55) 

Here tr denotes the trace of a matrix, I is the identity matrix and K f and K n are 
the covariance matrices E(ff T ) and E (nn T ) respectively. The minimum for e in 
(55) occurs for 

W = K f B T (BK f B T + K n )" 1 (55) 

Note that for K n = 0 the optimum filter becomes the inverse filter. Figure 5 
shows a block diagram of Wiener filtering; 

As to the assumptions, the Gaussian hypothesis cannot be valid because the {f } 
and fn) processes are ensembles of positive functions. Thus, the optimum 
restoration filter derived does not minimize the MSE. However, of all estimates 
obtainable by spatially invariant filtering of g, the linear estimator (56) gives the 
least MSE. Generally Wiener filtering represents optimum processing when a 
meansquared criterion is assumed to be optimal and the processing is performed 
independent of the human visual system. 



IMAGING SYSTEM OPTIMAL FILTER 

Figure 5. Optimal Filtering 
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3.3 CONSTRAINED DECONVOLUTION 


This method allows the experimenter to search for the optimum restoration by 
using a digital computer in an interactive manner [23] , The experimenter 
specifies certain parameters and tolerances, on the basis of which a restora 
tion is produced. If the result is not satisfactory, the experimenter may try a 
new set of specifications based on his previous observations and on his subjec- 
tive evaluation of the obtained restoration. The use of introspective evaluation 
in a feedback loop can be easily accomplished in the SMIP system [24) . 

The method is based on the vector form (36) of the convolution relation. The 
effect of noise and a priori information is introduced by specifying lower and 
upper limits for each component of these vectors. A nice feature of this method 
of restoration is that the constraints prevent the restored image from having 
negative values. None of the other methods can guarantee this in the presence 
of noise. 


3.4 CORRECTION TABLES 

Rectification of geometric distortion and photometric errors which are spatially 
and temporally stable involves changing coordinates and brightness values 
throughout the image. Corrections can be tabulated for selected points and 
suitable interpolation formulas used at intermediate points. Displacements for 
straightening the image coordinate raster can be obtained using a two-dimensional 
test grid. Photometric errors can be corrected by using the results for uniformly 
illuminated test fields. In most cases the correction involves nonlinear, position 
dependent operations [25] . This restoration technique applies to degradations 
not described by the linear position- invariant image forming model which is the 
basis for the other restoration methods. The program GEOM in the VICAR sys- 
tem provides a capability of correcting for distortion in the location of picture 
elements. 


3.5 OTHER METHODS 

In purely mathematical terms, the restoration of a degraded photograph is equiva- 
lent to the solution of a Fredholm integral equation of the first kind. This kind of 
equation arises in many physical problems and various methods have been pro- 
posed for numerically solving such equations [26, 27] . 
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4. IMAGE ENHANCEMENT 


An optimally restored picture may not be the most efficient form for visual data 
which are to undergo further processing and interpretation. Nonrestorative pre- 
processing may be necessary to aid either human or automatic photo interpreta- 
tion. The goal of image enhancement is to improve the quality of an image for 
human viewing without recourse to knowledge of degrading phenomena. Thus, 
enhancement operators change the image rather than restore it. Both mathe- 
matical and heuristic techniques are utilized in this process with the emphasis 
on the "viewing" of the image for extraction of information that may not have 
been so readily apparent in the original. Since the most common method for 
evaluating enhancement results is subjective human evaluation, understanding 
the evaluation process is also important for each enhancement application. 

Image quality is, however, difficult to evaluate as was discussed in section 2.5. 

i 

The problem is to model the human viewing process such that an optimum dis- 
play can be computed for the human as a communication sink. Although work in 
this area is limited, some pleasing results have been obtained by certain com- 
binations of logarithmic, linear and exponential processes. The problem is that 
no viable fidelity criterion for the human eye has been developed. If such a 
criterion would be known, then the power of information and communication 
theory could be directed toward optimal use of the channel between the image 
and man. 

The heuristic techniques try to attenuate or discard irrelevant material and at 
the same time to emphasize or clarify features and objects of interest. Attenua- 
tion is achieved by smoothing or integrating operators, but these also tend to 
blur detail. Emphasis is usually accomplished by sharpening or differentiating 
operators, or by feature or contrast enhancers, but these also tend to accentuate 
noise. Smoothing without obliteration of the relevant and sharpening without 
amplification of the irrelevant are the desired ends [ 28] . 

Most such techniques are linear and position- invariant and fit well into linear 
systems theory. Therefore, enhancement of regions and contours can be described 
in spatial or spectral terms. Pictures are convolved with suitable spatial masks 
or their spectra are multiplied by frequency filters. While restoration deals with 
entire spectra and means of obtaining well-behaved inverse operators, enhance- 
ment deals with parts of spectra. 

A human interpreter using adequately enhanced images can in many cases per- 
form a successful classification of pictorial data without usage of time consuming 
mathematical -statistical classification procedures. The various enhancement 
techniques can be organized as follows: 
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1. 

2 . 

3. 

4. 

5. ' 

6 . 


Spatial filtering 

Frequency filtering 

Contrast and contour enhancement 

Histogram equalization 

Color enhancement 

Other enhancement techniques 


Spatial filtering and frequency filtering are linear techniques based on the con- 
volution integral (10) and the convolution theorem. The other methods involve 
linear and nonlinear operations and heuristics. 


4.1 SPATIAL FILTERING 

4.1.1 Smoothing of Regions 

Smoothing can be used in order to suppress noise that is present in a picture. 
Primitive smoothing operations use convolution (spatial masking). If the digital 
picture is represented by an M x N matrix 


F = {f . . | 0 < i < M, 0 < j < N} 


(57) 


and the mask by the matrix 


W - |w. . \ 0 < J i 1 £ m , 0 < | j | £ n} 


(58) 


with m < M, n < N, then the transformed image G is produced by convolving W 
with F. The elements of G are weighted local averages or finite differences de- 
pending on the algebraic signs of the mask weights: 


*ij 


T y wf + + m<i<M“tn,n < j < N-n (59) 

L-a / pqi + p,j + q -- — — j — 

p ~ “m q = ^n 
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For border elements with subscripts 1 < i < m or M - m < i £M and 1 1 j < n 
orN-n < j _< N special provisions have to be made. 

When mask entries are identical, each point of the transformed image is a simple 
average of original values in its neighborhood. This is just the convolution of the 
picture with a function that has the value 1/A inside the neighborhood and 0 out- 
side, where A is the area of the neighborhood. 

When the mask entries are unequal, the average is weighted. The weighting pat- 
tern is to reflect specific noise and correlation characteristics of the image. 
Weights that decrease monotically from the central position are most commonly 
used. The larger the mask size, the more homogeneous is the smoothed image, 
but the poorer the rendition of fine detail and significant transitions. The mask 
size has to be smaller than the smallest detail to retain. It is desirable to intro- 
duce procedures in which the decision to smooth or not to smooth and the nature 
of the smoothing vary from one point to another. The simplest class of such 
operations combines averaging with thresholding. If the average grey value in 
some neighborhood of a point exceeds its grey value by more than a threshold t x 
then the grey value is replaced by the average, if it exceeds a threshold t 2 do not 
smooth [ 7] . This procedure can be used to clean up "pepper and salt" noise 
(isolated dark points in light regions and vice versa). Good results have been 
obtained for even small (3 x 3) neighborhoods. Smoothing can be accomplished 
with the FILTER program in the SMIP system. 

4.1.2 Enhancement of Edges (Sha rpening) 

Enhancement of edges is an important technique in image processing since 
borders contain a significant portion of the useful pictorial information. The 
most effective non- restorative sharpening methods use some type of spatial 
differentiation. If sharpening is required only in some particular direction, the 
directional derivative can be used. Sharpening in every direction can be accom- 
plished by taking the derivative in the gradient direction [ 28] . This maximal 
directional derivative is equal to the squares of the derivatives in any pair of 
orthogomal directions. For a digital picture (57) the crudest approximation 
would be 


= /Ui + i.i - f ii ) 2 + + i 


( 60 ) 


Another useful combination of derivatives is the Laplacian 



For a digital picture the approximation using an 8-element neighborhood is 


(V 2 f). 


8 f i i - 




+ f: 




( 61 ) 


yielding the mask 



i + i 


f i'l, i 


+ 1 


(62) 


The Laplaeian is, thus, approximated by convolving the picture with a mask 
having a positive peak that is surrounded by a negative annular valley, with the 
values chosen so that the integral of the mask is zero. The convolution of such 
a mask with the picture is zero in regions where the picture is constant or linear 
but not at edges across which the second derivative is nonzero. 

The weakness of these techniques is their overemphasis of accidental fluctua- 
tions. The spatial mask for sharpening must be larger than the largest differ- 
tial feature to be retained. Edge enhancement with the SHIP system can be 
accomplished with the FILTER program or with LAPLACE for the mask W L . 


4.2 FREQUENCY FILTERING 

Two-dimensional spectral analysis can be used to systematically describe the 
effects of spatial mask in terms of resolution and noise. Since an image can be 
represented by a series of orthogonal functions, it may be easier to deal with 
the coefficients in the series than with the original picture. The Fourier- 
transform is the most important functional representation used. Each mask is 






' ‘ , * I ; 

dual to a filter in the frequency domain and inversely, hence masks can be 
classified according to the act on of the corresponding frequency filters. A 
qualitative classification into 1) low pass, 2 ) high pass, 3} low emphasis, 4) high 
emphasis and 5) band pass filter is convenient. Masks shoulu be symmetric 
about the central element, if isotropy is desired and asymmetric if directional 
information is to be emphasized. 


The function of the low pass filter is to retain low frequency information and 
reject high frequencies. Thus, low pass filtering is related to smoothing. The 
problem is to select the cutoff frequency so that the filter is narrow enough to 
remove noise and wide enough to preserve detail. The purpose of the high pass 
filter is to remove low frequencies corresponding to background and to enhance 
edges. Thus, high pass filtering is related to sharpening. A bandpass filter 
retains frequencies between two limits and is equivalent to a low pass, high pass 
pair. High and low emphasis filters try to compensate for spectral losses and 
to stretch contrast, but not at the risk of disregarding useful information. The 
simplest low emphasis filter is an equal weight mask in the spatial domain. In 
the frequency domain this mask corresponds to a sinx/x filter. One of the moat 
useful image enhancement filters is the high frequency emphasis filter. This 
filter passes a certain amount of low frequency information and emphasizes the 
high frequency information producing an image with lower contrast but enhanced 
edge information. The program FREQFILT in the SMIP sys 3m can be used to 
apply a variety of filters to the Fourier transform of an image. . . 


4.3 CONTRAST AND CONTOUR ENHANCEMENT 

4,3.1 Grey Scale Transformations . . . , . . .... , '► .■ .. . 

Selective contrast enhancement can often L ■%. obtained by transformation of the 
grey scale or by grey scale requantizsfion. Generally, quantisation schemes 
involve a fixed number of grey levels apportioned uniformly over the range of 
greyvalues. This method can create false contours and may malt© the quantized 
approximation of the original picture unacceptable, since real objects in the 
picture may be concealed [7J. *. rv. < *. :•.> \ •/ , . 


X£ the grey scale statistics of the image can be determined, it is possible to 
* quantize optimally by using finely spaced qtianiization levels in the most informa- 
tive part of the scale.;- This increases the average accuracy of the quantization / 

- . without increasing the number of levels. * . . 

The choice of qimtizsMoa levels can be. msde to depend on the nature of the 
pictures being analyzed. The eye is relatively poor at estimating the grey levels 
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immediately adjacent to a sharp edge on a picture, so that coarse quantization 
can be used near such edges. If too few quantization levels are used in a smooth 
region of the picture, the transitions from level to level will show up conspicu- 
ously as false contours. The SMIP system provides commands for a dynamic 
selection of quantization levels. , 

Another type of contrast enhancement is effected by linear or nonlinear trans- 
formations of the grey scale. Portions of the grey scale may be stretched or 
compressed. Contrast enhancement by linear stretching over the brightness 
range has its limit if there is already a large difference in brightness between 
light and dark portions of the picture. If, however, a picture is examined in each 
local region for fine variations, and only those brightness values near the fine 
structure are stretched, then the details can be brought out without the satura- 
tion caused by indiscriminate contrast stretching [ 25] . The program STRETCH 
may be used to perform linear and nonlinear transformations of the grey scale. 

4.3.2 Enhancements of Contours and Curves * , . V. . 

Contour information is usually extracted after edge enhancement. Connected 
regions can be assigned continuous connected borders which are tracked and 
encoded. It may be desirable to smooth the borders by removing points which 
cause unnecessary perturbations or sharp variations in curvature. Curves can 
be approximated by interpolating straight-line segments between sample points. 

A type of po’ygcml approxit ivtion, is known as chain encoding [ 29] . A polygonal 
contour of minimal length can be fitted to a closed contour by using nonlinear 
programming [ 30J . ... . . . . 

Contours are usually extracted by thresholding the original picture or transfor- 
mations of it or by using d . rivatives. The resultant contours are often thick or 
discontinuous. The enhancement problem is to provide coateuuatioa and closure 
by filling gaps and eliminating perturbations and by thinning. Thinning processes 
are to rcdu.ce connected elongated objects to line-like representations which 
preserve connectivity. Many algorithms have besa suggested [ 3 3. , 32J . 


4.4 HISTOGRAM EQUALISATION 


This enhancement technique uses as many grey levels aa' possible for the display 
of an image by uniformly distributing the grey values. The tecta* qu© is equiva- 
lent to a nonlinear position- invariant operator on the intensity scale. Equalisa- 
tion is useful for images with heavily biased Ms tegrama toward an end of fee 
grey scale range. St mak taa subtle cfejE hjoito evicleM- to regions v/Mch 
Buch changes osc^s* most frsq'ieiitly » while iostog subtle fetesfriiy clss-wg©3 in 




other regions [33] . The program SPREAD in the SMIP system performs histo- 
gram equalization on an image. 


4.5 COLOR ENHANCEMENT 

The human eye is able to separate more colors than brightness levels. There- 
fore, image processing techniques should include color images. One specific 
technique, called pseudo color, involves the use of color in the presentation of 
intensity images. The objective is to aid human interpretation by representing 
a black-and-white image in color and taking advantage of the human visual sys- 
tem to perceive color differences. This technique actually generates color 
where none exists. 

For a typical black-and-white image scene the average observer can distinguish 
simultaneously only about 15 to 20 grey- scale steps from black to white. The 
use of black-and-white images has the effect of restricting the operation of the 
visual system to the vertical axis of its color perception space (Figure 6). 

Hence, the ability of the visual system to distinguish many hues and saturations 
at each brightness level is unused. Clearly, the visual system using simultaneous 
brightness and chromatic variation can distinguish many more levels of informa- 
tion than one using the grey scale alone. It can also more readily recognize 
patterns of constant density when these are replaced by a given color through a 
pseudo-color process [48] . 

The enhancement of images by pseudo- color should be carefully distinguished 
from false-color enhancement schemes. False-color, like true-color, utilizes 
multispectral information from the original scene, but the wavelength-band is 


WHITE 



Figure 6. Color Perception Space 
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not necessarily restricted to the visible spectrum. In pseudo-color, the situation 
is fundamentally different. The energy from any point in the object scene is 
spectrally nonselective, it differs from other points only in intensity. The 
pseudo- color process transforms the recorded black-and-white image into color 
in a manner that controls the relationship between the color in the final image 
and the corresponding intensity in the original. Proper choice of this relation- 
ship permits full use of the abilities of the human visual system to use hue and 
saturation in addition to brightness for the purpose of discrimination. Such a 
technique is described in [45] . The program FOTO in the SMIP system trans- 
forms a picture into a form suitable for generating a pseudo-color image on the 
E1S machine. 

Another enhancement technique, using false color, can be used to estimate the 
gradient in each component of the vector-valued picture function. Summing the 
magnitudes of the estimates will enhance boundaries between regions of different 
colors even if regions had the same brightness in the black and white pictures. 
Extraction algorithms can also be extended to vector-valued picture functions. 


4.6 OTHER ENHANCEMENT TECHNIQUES 

Subtraction of images is a useful method for amplifying differences between two 
pictures. Subtraction of an unprocessed picture from a filtered picture, for ex- 
ample, can be used to evaluate the filter itself when changes produced by the 
filter are rather subtle. After registration, subtraction can be applied to detect 
temporal changes in images of the same scene. 

Another technique which has been found useful is forming the ratios of pictures. 
This process largely eliminates the brightness components of the original pic- 
tures and produces a color display whose color variations are more indicative 
of material variations than the simple pseudo color displays. 

The program PICFUNC in the SMIPS-library allows the function 


z 


Ax + By + C 
Dx + Ey + F 


(Gx + H ) 1 


+ J 


(63) 


to be performed on two input pictures whose digital intensity values are x and y 
respectively and z are the elements of the result picture. 
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The color pictures on the following three pages show the effect of enhancement 
operations on an ERTS-1 MSS5 image of the Great Salt Lake area. In the first 
picture the original image (upper left) has relatively low contrast and was linearly 
contrast stretched to the full range for visual inspection (upper right). High fre- 
quency detail is already enhanced by this simple stretching operation. A non- 
linear stretch with a cuberoot function was performed to enhance detail in the 
lower range of the intensity scale (lower right). Comparison of the stretched 
images with the original reveals considerable detail not visible in the original. 

The picture with the histogram equalized (lower left) shows also increased con- 
trast over most of the range, A uniform stretching operation was used in order 
to have a uniform color presentation throughout a series of pictures. 

The second color picture shows results of spatial filtering the same ERTS-1 MSS5 
image. The picture pair in the left half represents the highpass and lowpass com- 
ponents after filtering with a 5 x 5 filter with unit weights. The lowpass cutoff 
frequency is f c = 1.75 km -1 , therefore, higher frequency components are removed 
from the lowpass image. The highpass image is obtained by subtracting the low- 
pass signal from the original and adding a constant. The image pair in the right 
half shows the highpass and lowpass components after filtering with a 51 x 51 
filter with unit weights. Here all frequency components above f = 0.172 km -1 
are removed from the lowpass image which shows only the global background 
variation. In comparison, the highpass filtered image reveals fine edge details. 
Filtering was performed with the program FASTFIL2 in the VICARS library. 

The third picture shows at the top false color images of bands 4, 5 and 7 and 
the ratios 6/4, 7/5 and 7/6 of the same ERTS scene. Below the ratios 6/4 and 
7/5 are shown in pseudo color. The program PICFUNC was used to calculate 
the ratios. 

The black and white image represents the two dimensional Fourier transform of 
a 512 by 512 area starting at line 685 and column 100 of the original ERTS-1 
MSS5 image. The spatial frequencies along the coordinate axes arise from the 
imager boundaries. The two other pronounced frequency directions are ortho- 
gonal respective^ to the shore lines and to the feature which crosses the lake. 

These pictures demonstrate the flexibility of VICARS and SMIPS for image ma- 
nipulation. The individually stretched images were combined with CONCAT, 
pseudo color enhanced with FOTO for the E. I. S, film writer. For the two other 
pictures pseudo color processing was performed with COLFILT. The blue, 
green and red images were inserted into background pictures with OPTINS, con- 
verted to film on the OPTRONICS film writer and photographically superimposed. 
The entire analysis, from the original pictorial data to the final color picture can 
be performed without any programming knowledge using only the operation pro- 
vided by the system. 
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5 . OB JE C T EXTRAC TION 


The role of object extraction in image processing is to find a description of a 
picture in terms of appropriate picture subsets (objects) and to specify proper- 
ties of these subsets. Specifying a subset of a picture is equivalent to specifying 
its characteristic function, i.e., the function whose value is 1 at the points of the 
subset and 0 elsewhere. Two types of processing, detection and articulation 
(segmentation), are commonly subsumed under object extraction. Detection is 
concerned with making decisions about the presence or absence of specific ob- 
jects and estimating their position in the image. Articulation is concerned with 
the assignment of boundaries to objects in the picture. 


5.1 OBJECT DETECTION 

Detection refers to locating objects which are prespecified in terms of either 
local spatial distribution of grey values (template) or a transform of such a dis- 
tribution (filter). Detection involves finding optimum matches between templates 
and images. The templates or filters can be selected intuitively, by random 
generation and systematic evaluation [36] , by adaptive learning from sets of 
training images [37,38] or by design according to the principles of statistical 
communication theory [ 18,39] . The main mathematical tools for object detec- 
tion are correlation, optimum frequency filtering and other transformations such 
as the Hadamard transform [40] . 

A picture or some transform is compared with elements from a library of 
representative features and objects or their appropriate transforms. For each 
feature a figure of merit is computed at each picture point which indicates the 
degree of match for that feature. These picture transformations can be thresh- 
olded and converted to a map locating instances of the features in the picture. 

5.1.1 Cross Correlation 

One measure of how well a portion of a picture matches a template can be defined 
as 


M(m, n) 


Y Y lg(i' j) “ t (i ~ m, j ~ n)| 

i j 


(64) 


where g(i, j) is the digital picture, t(i,j) is the template and i, j are such that 
i - m and j - n are in the domain of definition of the template. This definition 
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amounts to computing M (m, n) for all template translations (m, n) and noting 
those translations for which M(m,n) is small. 

Another measure of similarity is based on the Euclidean distance between two 
vectors. This leads to the cross-correlation between two functions g and t, de- 
fined by 


R(m, n) 


£ £ g(i, j) t(i + m, j + n) 



£ £, 


t 2 ( i + m, j + n) 


(65) 


where the sum is taken over all i and j within the domain of the translated tem- 
plate. The picture and the template are declared similar when the cross- 
correlation is large, the peak height being a measure of degree of match. How- 
ever, numerical results are sensitive to changes in orientation, mean grey value, 
contrast and noise making the detection of the peak difficult. 

The optimal spatial mask for a feature is dependent not only on its spatial pat- 
tern, but also on its internal spatial correlations [41] . The simple correlation 
measure, however, ignores the spatial relationship of points within each image. 

A statistical correlation measure can be defined as 


R s (m, n) 


H Z p(i ’ t(i + +n) 


Z Z p2(i> j) (Z Z t2(i + j + n) 


( 66 ) 


where p(i, j) is obtained by convolving the sampled image g(i, j) with the filter 
function D ( i , j) . Thus , 


p - g+ D 


(67) 


It can be shown [41] , that for a noise free Markow process image (an image 
with a covariance matrix of the form [pb'il] where p is the correlation 
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coefficient between row elements of g), the optimal spatial filter is given by 


P‘ 


-Pi 1 + P 2 ) 


D 


-Pi i + p 2 ) (i + p 2 ) 2 -Pi i + p 2 ) 


( 68 ) 
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“P(l + P 2 ) P : 


For a random pattern (p = 0) the statistical correlation measure reduces to the 
simple cross-correlation. For a highly correlated image (p % 1) the mask be- 
comes 


D 



-2 


4 


- 2 



(69) 


This operator is the discrete approximation of the mixed fourth derivative 
S 4 /Bx 2 By 2 . The linear preprocessing of the image prior to the application of 
the basic correlation measure has been shown to provide a considerable im- 
provement in the detectability of specific patterns or image misregistrations. 

5.1.2 Matched Filtering 

Various filtering operations have been described for image restoration and 
image enhancement. A filter that offers classification and correlation is the 
matched filter. The matched filter maximizes the signal-to-noise ratio in a 
detection process when the noise perturbation is considered additive [ 18] . If 
the pattern f (x, y) is to be detected in the image 


g(x, y) - f (x, y) + n (x, y) 


(70) 
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then the matched filter is given by 


H (u, v) 


F* (u, v) 
N (u, v) 


(71) 


where F*(u,v) is the conjugate of the Fourier transform of f (x,y) and N(u,v) is 
the power spectrum of the noise process n(x,y), This concept can be generalized 
to gradient- matched filters which detect on the basis of edges rather than en- 
ergy [42] . The matched filter is translation but not rotation invariant, the peak 
in the output plane determines the location of the object to be detected. 


5.2 ARTICULATION OF IMAGES 

Articulation is concerned with the optimal assignment of boundaries to objects. 
The nontriviality of this task is apparent from inspection of multilevel digital 
pictures (see Figure 7). The problem is to find a description which refers to 
various sub'sets (objects) of the picture and specifies properties of these subsets. 
An image processing system must be capable of singling out the appropriate 
picture subsets. This process is also called segmentation. There is no univer- 
sal method of segmenting a picture. Many different types of subsets can be ob- 
jects, depending on the type of description that is required [ 8] . 

5.2.1 Thresholding 

The basic method used for singling out a subset of a picture is to obtain the sub- 
set by thresholding. Object extraction by thresholding can be used on original 
pictures or on transformed (preprocessed) pictures. Specifically, for any pic- 
ture g and any picture operation T that transforms pictures into pictures the ob- 
ject can be taken as the set of points on which t t < T (g) < t 2 , where t 1 and 1 2 
are real constants. 

When thresholding is applied to the picture itself, the presumption is that regions 
of interest display different, fairly constant grey values. In many cases this may 
be a very natural approach; e.g., in earth observations, clouds, water and terrain 
will have distinguished grey values. 

When thresholding is applied to a spatially differentiated picture (see Section 4.1) 
contouring occurs. This method can be used to single out subsets that cannot be 
characterized as having a prespecified range of grey levels, but that contrast 
with their surroundings and will yield outlines of such subsets. 
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Although thresholding is very effective on many types of images, it has short- 
comings: 1) false contouring and smoothing can occur, 2) there is no guarantee 
that the resultant contours are closed curves and 3) the results are very sensi- 
tive to the selection of thresholds. It is, however, often possible to select good 
thresholds by examining the frequency distribution of grey levels in g. Especially 
for multimodal histograms it is possible to segment the picture by choosing the 
thresholds between the mode peaks. This is extremely facilitated in an inter- 
active image processing system like SMIPS, where it is possible to display the 
histogram on the screen and to select immediately the thresholds. 

Figure 7 shows a part of the ERTS-1 MSS5 scene from the first color picture 
for which the original 128 value greyscale has been compressed to 13 levels. 

The result of thresholding this picture at the points indicated in the histogram 
in Figure 8 is shown in Figure 9. Simple thresholding has considerably enhanced 
this particular feature. 

5.2.2 Region Analysis 

Once a figure has been extracted from a picture, it becomes possible to perform 
operations in which the points of the figure are involved. Region analysis at- 
tempts to simplify a digital picture by partitioning it into a set of disjoint regions. 
Each region is composed of picture points having the same grey value and con- 
nected to each other. Region analysis involves the concepts of connectivity, 
geometry and shape [8,43] and will not be discussed in this report. 

5.2.3 Contour Following 

A recurring theme in image analysis is that a picture may be simplified by 
representing objects in the picture by their contours. The use of gradient oper- 
ators for border detection has been discussed. Another technique is called con- 
tour following, it involves tracing out the boundary between a figure and its 
background. 

Two conditions must be fulfilled if a contour following algorithm is to be suc- 
cessfully applied. First, since a picture generally has many grey levels, there 
must be some way of defining the figure whose contour is to be followed. In 
simple cases the figure can be extracted from the background by thresholding. 
This is a reduction to a binary picture. When this cannot be done, it is some- 
times possible to combine a gradient operator with a contour follower [43] . The 
second prerequisite for successful contour following is that the figure has no 
spurious gaps in it. This problem can sometimes be overcome by first smoothing 
the picture in order to fill small gaps. Contour following is intrinsically a serial 
problem. An error made at any step makes it more likely that succeeding steps 
will also be in error. Therefore, the applicability of contour following seems re- 
stricted to pictures with low noise levels. 
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6. PICTORIAL PATTERN RECOGNITION 

The huge masses of data in the form of pictures that are being continuously 
collected present a vital need for the automatic analysis of pictorial patterns. 
Presently the ability of machines to perform perceptual tasks is, however, very 
limited. The study of a more modest problem, the assignment of patterns to one 
or several prespecified classes (classification) has led to an abstract mathe- 
matical model that provides the theoretical basis for classifier design. This 
subject of mathematical pattern recognition is covered in [43,44] which contain 
many references to previous work. The purpose of this section is only to intro- 
duce the terminology and to formulate the problem of automatic pattern classifi- 
cation. 

Mathematical pattern recognition is the study of mathematical techniques which 
support human experience in the classification of patterns. Mathematically, the 
problem is to find a function that maps a set of pictures into a set of classes. 

It is usually convenient to do this in three steps, preprocessing, feature extrac- 
tion and classification. Figure 10 illustrates the concept. 


REAL PATTERN FEATURE CLASSIFICATION 



Figure 10 


The physical world is sensed by some sensor system which describes a repre- 
sentation of that world by R scalar values where R is typically quite large. Some 
preprocessing is then performed on the raw pictures, e.g., radiometric and geo- 
metric corrections, noise cleaning, etc. Because R is large, it is desirable to 
reduce the dimensionality of the pattern space while still maintaining the dis- 
criminatory power for classification purposes which is inherent in the pictures. 
Therefore, in a feature space of dimensionality N (much smaller than R) classi- 
fication rules can be computed in reasonable amounts of time. The classification 
space in which one of K classes can be selected is of dimensionality K. 

It should be emphasized that in many cases image enhancement and classification 
by a human interpreter can solve the same problem at considerably lower cost. 
Automatic classification without feature selection and preprocessing is time 
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consuming and dangerous because noise may blur decision boundaries resulting 
in mis classification. 

The essence of pattern recognition resides in the selection of few, good variables. 
Therefore, feature selection is probably the most important aspect of pattern 
recognition. Two distinct reasons for the need of feature selection should be 
observed. The pattern space is represented by the sensor data and sensors are 
often designed by other than classification considerations. Thus, it does not 
seem unreasonable to conjecture that there may be combinations of the dimen- 
sions of the pattern space affording meaningful classification power which other- 
wise would not be exploited. The second need for feature selection lies in the 
requirement for a space in which classification algorithms can be efficiently 
implemented. In the high dimensional pattern space even the simplest classifi- . 
cation algorithms are quite time consuming on large scale digital computers. 

In addition, the selected features in the reduced space may cluster better than in 
the pattern space and will allow simpler decision rules. With improper feature 
extraction the classification algorithms will necessarily be less efficient and 
classification errors will increase. 

In many cases features seem to have been selected because of their mathematical 
tractability or ease of implementation rather than because of their suitability for 
the given classification task. The soundest approach to feature selection is to 
use knowledge about the structure of the patterns and the definition of the classes 
as a guide in choosing the features and preprocessing operations. Feature se- 
lection is much more problem dependent than classification. 

The problem of classification is basically one of separating the feature space 
into regions, one for each category. Classification has evolved to a higher state 
of refinement than feature selection. Two areas of classification can be distin- 
guished. In the context of supervised learning prototypes are known as to their 
correct classification. The classification problem is finding separating surfaces 
which correctly classify the known prototypes and which afford some degree of 
confidence in correctly classifying unknown patterns. 

In many data analysis applications such classification information is not available. 
Thus, the subject of nonsupervised learning attempts to apply recognition tech- 
niques to unclassified data. Results may be descriptions of the number of classes 
or clusters the data fall into. Once clusters are defined, the supervised learning 
techniques become meaningful. Clustering can be defined as the nonsupervised 
classification of objects which amounts to the process of generating classes 
without any knowledge of prototype classification. The essential characteristic 
is the sorting of the data into subsets such that each subset contains data points 
that are as much alike as possible. 
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A drawback of supervised classification techniques for multispectral data is 
associated with the high variability of the spectral signatures. Application of 
supervised techniques requires, therefore, obtaining the reference signatures 
from training areas which form part of, or are near, each particular survey 
area. Even with this practice, iterations and considerable human judgement are 
required to select proper training areas such that the classification is of accept- 
able accuracy [ 49] . 

Unsupervised classification groups multispectral data into a number of classes 
based on some intrinsic similarity within each class. It avoids reference signa- 
tures and the physical identification of each class is done after processing by 
checking a small area belonging to each class. Because of this reversed order 
with respect to supervised techniques, the investigator will know where to select 
the ground truth data based on the resulting classification map. 

The sequential clustering program described in [49] is available at LMES. 
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